专利摘要:
DECODER FOR GENERATING A MULTI-CHANNEL AUDIO SIGNAL, CODER FOR GENERATING A CODED REPRESENTATION OF A MULTI-CHANNEL AUDIO SIGNAL, METHOD OF GENERATING A MULTI-CHANNEL AUDIO SIGNAL, METHOD OF GENERATING A CODIFIED PRODUCT REVIEW, MULTI-CHANGE PRODUCT REPRESENTATION FLOW OF AUDIO BITS FOR A MULTI-CHANNEL AUDIO SIGNAL AND STORAGE MEDIA A coder for a multichannel signal which comprises a down-mixer (201, 203, 205) to generate a down-mix signal as a combination of at least one first and a second channel signal weighed by, respectively, a first and a second weight with different amplitudes for at least some time-frequency intervals. In addition, a circuit (201, 203, 209) generates up-mix parameter data by configuring a relationship between the channel signals, as well as configuring the weights. A circuit generates weight estimates for the encoder weights from the up-mix parameter data; and comprises an up-mixer (407), which recreates the multichannel audio signal by performing up-mixing on the down-mix signal in response to the up-mix parameter data, the first weight estimate and the second weight estimate . Up-mixing (...).
公开号:BR112012011084B1
申请号:R112012011084-5
申请日:2010-11-05
公开日:2020-12-08
发明作者:Albertus Cornelis Den Brinker;Erik Gosuinus Petrus Schuijers;Arnoldus Werner Johannes Oomen
申请人:Koninklijke Philips N.V.;
IPC主号:
专利说明:

FIELD OF THE INVENTION
The invention is related to parametric encoding and decoding and, in particular, parametric encoding and decoding of multichannel signals using a down-mix and up-mix parametric data. HISTORY OF THE INVENTION
The digital coding of several source signals has become increasingly important over the past decades as digital signal representation and communication has increasingly replaced analog communication and representation. For example, the distribution of media content, such as video and music, is increasingly based on encoding digital content.
The encoding of multichannel signals can be performed by down-mixing the multichannel signal to fewer channels and encoding and transmitting them. For example, a stereo signal can be down-mixed to a mono signal, which is then encoded. In parametric multichannel coding, parameter data is additionally generated, which supports an up-mixing of the down-mix to recreate (approximations) of the original multichannel signal. Examples of multichannel systems using down-mixing / up-mixing and associated parameter data include the technique known as the Parametric Stereo (PS - “Parametric Stereo”) standard and its extension to multichannel parametric encoding (for example, MPEG Surround: MPS ).
In its simplest form, down-mixing a stereo signal to a mono signal can be accomplished simply by averaging the two stereo channels, that is, simply by generating the average or sum signal. This mono signal can then be distributed and can additionally be used directly as a mono signal. In coding approaches such as that used by parametric stereo, stereo indications are provided with the down-mix signal. Specifically, parameters for channel differences, time or phase differences and coherence or correlation are determined by time-frequency block (which typically corresponds to a Bark or ERB band division of the frequency axis and is a fixed uniform segmentation of the time axis). These data are typically distributed together with the down-mix signal and allow an accurate re-creation of the original stereo signal to be done by an up-mixing that is dependent on the parameters.
However, it is well known that creating the medium signal typically results in somewhat monotonous signals, that is, with reduced brightness / high-frequency content. The reason is that, for typical audio signals, the different channels tend to be highly correlated for low frequencies, but not for high frequencies. The direct sum of the two stereo channels effectively suppresses the non-aligned signal components. In effect, for frequency sub-bands where the left and right signals are completely out of phase, the resulting average signal is zero.
One solution that has been proposed is to use the phase alignment of the channels before the sum is realized. Thus, ideally, the left and right signals are compensated for any phase difference in the frequency domain (corresponding to the time difference in the te.mpo domain) before being joined. However, such an approach tends to be complex and can introduce an algorithmic delay. In addition, in practice, the approach tends not to provide 'ideal' quality. For example, if the gap between channels is measured, there is an ambiguity about aligning the phase of the channel on the left • with the channel on the right or vice versa. Furthermore, trying to shift the phase from both channels equally leads to ambiguity. In addition, the lag is numerically aggressive when the correlation is low, thus resulting in a less accurate and robust system. In general, these issues 10 tend to lead to noticeable artifacts in the creation of a phase-mix down-mix. Typically, modulations in tonal components result from this approach.
As a consequence, most of the practiced systems tend to use a so-called passive down-mix 15 generated simply as the average of the left and right signals. Unfortunately, passive down-mixing also has some associated disadvantages. One of these is that the acoustic energy can be substantially reduced and even completely lost to outdated signals. A proposed method 20 to address this is to use a so-called active down-mix, where the down-mix is scaled to have the same energy as the original signals. Another proposed solution is to provide energy compensation on the decoder side. However, such compensations tend to be at a considerably global level and do not differentiate between tonal components (where compensation is needed) and noise (where it is not). Furthermore, in both passive and active down-mix approaches, • problems occur for signals that are approaching lagging. In effect, outdated components are completely absent in the down-mix signal.
Thus, an improved system for multichannel parametric encoding / decoding would be advantageous and, in particular, a system allowing for greater flexibility, easier operation, easier implementation, reduced complexity, improved robustness, improved encoding of outdated signal components, proportion of data rate versus reduced quality and / or better performance would be advantageous. SUMMARY OF THE INVENTION
Therefore, the invention preferably seeks to mitigate, mitigate or eliminate one or more of the above disadvantages, either individually or in combination.
According to one aspect of the invention, a decoder is provided for generating a multichannel audio signal, the decoder comprising: a first receiver for receiving a down-mix being a combination of at least a first heavy channel signal by a first weight and a second channel signal weighed by a second weight, the first weight and the second weight having different amplitudes for at least some time-frequency intervals; a second receiver for receiving up-mix parameter data by configuring a relationship between the first channel signal and the second channel signal; a circuit for generating a first weight estimate for the first weight and a second weight estimate for the second weight from the up-mix parameter data; and an up-mixer to generate the multichannel audio signal by up-mixing the down-mix in response to the up-mix parameter data, the first weight estimate and the second weight estimate, the up-mixing being dependent on an amplitude of at least one of the first weight estimate and the second weight estimate.
The invention can allow for improved and / or facilitated operation in various situations. The approach can typically mitigate lag problems and / or disadvantages of phase alignment coding. The approach can often allow for better audio quality, without requiring a higher data rate. A more robust encoding / decoding system can often be achieved and, in particular, encoding / decoding may be less sensitive to specific signal conditions. The approach may allow implementation of low complexity and / or have a low requirement for computational resources.
Processing can be based on sub-bands. Encoding and decoding can be performed in frequency sub-bands and time intervals. In particular, the first weight and the second weight can be provided for each frequency sub-band and for each (time) segment, together with a down-mix signal value. The down-mix can be generated individually in each subband by combining the frequency subband values of the first and second channel signals weighed by the weights for the subband. The weights (and thus the weight estimates) for a subband have different amplitudes (and thus different energies) for at least some values of the first and second channel signals. Each time-frequency interval can specifically correspond to an encoding / decoding time segment and a frequency sub-band.
The up-mix parameter data comprises parameters that can be used to generate an up-mix corresponding to the original multichannel down-mix signal from the down-mix. The up-mix parameter data can specifically comprise Tnterchannel Level Difference (ILD) parameters, Channel Consistency / Correlation (IC / ICC - Interchannel Coherence / Correlation), Channel Channels ( IPD - "In terchannel Phase Difference") and / or Time Difference Between Channels (ITD - "Interchannel Time Difference"). The parameters can be provided for frequency sub-bands and with an appropriate update interval. In particular, a set of parameters can be provided for each of a plurality of frequency bands for each encoding / decoding time segment. The frequency bands and / or time segments used for the parameter data may be identical to those used for the down-mix, but need not be. For example, the same frequency sub-bands can be used for lower frequencies, but not for higher frequencies. Thus, the time-frequency resolution for the first and second weights and the parameters of the up-mix parameter data need not be identical.
One of the first and second weights (i.e., the corresponding weight estimates) can, for some signal values, be zero in a subband. The combination of the first and second channel signals can be a linear combination, such as a specifically linear sum with each signal being sized by the corresponding weight, prior to the sum.
The multichannel signal comprises two or more channels. Specifically, the multichannel signal can be a two-channel (stereo) signal.
The approach can, in particular, mitigate lag problems to provide a more robust system, while, at the same time, maintaining low complexity and low data rate. Specifically, the approach can allow different weights (with different amplitudes) to be determined without requiring additional data to be sent. Thus, improved audio quality can be achieved without requiring a higher data rate.
The determination of the first and second weight estimates can use the same approach that is (assumed to be) used to determine the first and / or the second weight in the encoder. In various embodiments, one or both of the weights / weight estimates can be determined based on an assumed function to determine the weight / weight estimate of the parameters of the up-mix data.
The decoder may not have explicit information on the exact characteristics of the received signal, but may simply operate on the assumption that the down-mix is a combination of at least a first heavy channel signal for a first weight and a second heavy channel signal for a second weight, where the first weight and the second weight have different amplitudes for at least some time-frequency intervals. A time-frequency interval can correspond to a time interval, a frequency interval or the combination of a time interval and a frequency interval, such as, for example, a frequency subband in a time segment.
According to an optional feature of the invention, the circuit is arranged to generate the first weight estimate and the second weight estimate with different ratios for at least some parameters of the parameter data for at least some time-frequency intervals.
This can provide an improved encoding / decoding system and can, in particular, mitigate lag problems to provide a more robust system. The functions for determining weight estimates from parameters must therefore be different for the two weights, such that the same parameters will result in weight estimates with different amplitudes.
The encoder can therefore be arranged to determine the first weight and the second weight to have different relationships with at least some parameters of the parameter data for at least some time-frequency intervals.
A time-frequency interval can correspond to a time interval, a frequency interval, or the combination of a time interval and a frequency interval, such as, for example, a frequency subband in a time segment .
According to an optional feature of the invention, the up-mixer is arranged to determine at least one of the first weight estimate and the second weight estimate as a function of an energy parameter of the up-mix parameter data, the energy parameter being indicative of a characteristic relative energy for the first channel signal and the second channel signal.
This can provide improved performance and / or easier operation and / or implementation. Energy considerations can be particularly relevant for determining suitable weights, and these can therefore be more adequately represented and correlated with the energy parameters of the up-mix parameter data. Thus, the use of energy parameters to determine weights / weight estimates allows an efficient communication of information, allowing weights / weight estimates with different amplitudes to be determined. In particular, the use of energy parameters to determine weights / weight estimates allows an efficient determination of the amplitude of the weights, rather than merely the phase of the weights. Energy parameters can specifically provide information on the energy characteristics (or power, equivalently), the first channel signal, the second channel signal, a difference between or a combined signal energy (such as a cross power characteristic ).
In accordance with an optional feature of the invention, the energy parameter is at least one of: an Intensity Difference Between Channels parameter, IID; a Level Difference Between Channels parameter, ILD; and a parameter of Coherence / Correlation Between Channels, IC / ICC.
This can provide particularly advantageous performance and can provide improved backward compatibility.
According to an optional feature of the invention, the up-mix parameter data comprises a precision indication for a relationship between the first weight and the second weight and the up-mix parameter data, and the decoder is arranged to generate at least one of the first weight estimate and the second weight estimate in response to the accuracy indication.
This can provide improved performance in a variety of situations, and can, in particular, allow for an improved determination of more accurate weight estimates for different signal conditions.
The accuracy indication can be indicative of a precision that can be obtained for a weight estimate when calculating this from the parameter data. The accuracy indication can specifically indicate whether the attainable accuracy meets an accuracy criterion or not. For example, the precision indication can be a binary indication simply indicating whether the parameter data can be used or not. The precision indication can comprise an individual value for each sub-band or can comprise one or more indications applicable to a plurality of, or even all, sub-bands.
The decoder can be arranged to estimate weight estimates from parameter data only if the accuracy indication is indicative of sufficient accuracy.
According to an optional feature of the invention, at least one of the first weight and the second weight, for at least one frequency range, has a finer frequency-temporal resolution than a corresponding parameter of the up-mix parameter data .
This can provide better performance in several situations, since more accurate weights can be used for. generate the down-mix, while at the same time allowing the data rate to be kept low.
Similarly, at least one of the first weight estimate and the second weight estimate, for at least one frequency range, may have a finer frequency-temporal resolution than a corresponding parameter in the up-mix parameter data.
The corresponding parameter is the parameter that includes the same frequency time interval. In several embodiments, the decoder can proceed to generate the estimate for the first and / or second weight based on the corresponding parameter. Thus, although the parameter represents signal characteristics over a longer time and / or frequency, it can still be used as an approximation for the time interval and / or frequency for the weight.
According to an optional feature of the invention, the up-mixer is arranged to generate a General Lack value, in response to the parameter data and to perform the up-mixing in response to the General Lack value, the General Lack value being dependent the first weight estimate and the second weight estimate.
This can allow efficient decoding with high quality. This can, in some situations, provide improved backward compatibility. OPD (General Phase Difference - "Overall Phase Difference") is individually dependent on both the first and second weight estimates (including their amplitudes) and can be specifically defined as a function of weights, that is, OPD = f (Wi, w2) up-mix can, for example, substantially as:
where s is the down-mix signal and s ^ is a decorrelated signal generated by the decoder for the down-mix signal. Ci and c2 are gain parameters that are used to reestablish the correct level difference between the left and right output channels and ee @ are values that can be generated from the up-mix parameter data.
The OPD value can, for example, be generated substantially as:
or, for example, substantially as:
where wl and w2 are the first and second weights respectively, and the down-mix signal is generated by = Wj • l + w2 • r
According to an optional feature of the invention, the up-mixing is independent of the amplitude of at least one of the first weight estimates and the second weight estimates, except for the General Lag value. This can allow for better performance and / or operation. In accordance with an optional feature of the invention, the up-mixer is arranged to: generate a decorrelated signal from the down-mix, the de-correlated signal being de-correlated with the down-mix; perform the up-mix of the down-mix by applying matrix multiplication to the down-mix and the decorrelated signal, in which the matrix multiplication coefficients are dependent on the first weight estimate and the second weight estimate.
This can allow efficient decoding with high quality. This can, in some situations, provide improved backward compatibility.
Matrix multiplication can include a forecast coefficient, representing a forecast of a difference signal from the down-mix signal. The forecast coefficient can be determined from the weights. The multiplication of matrices can include a factor of scaling of de-correlation representing a contribution to a sign of difference from the sign of de-correlation. The de-correlation design factor can be determined from the weights.
The coefficients of matrix multiplication can be determined from the estimated weights. The different coefficients can have different dependencies on the first and second weights, and the first and second weights can affect each coefficient differently.
The up-mix can specifically be performed substantially as:
where a is the forecasting factor, β is the scaling factor of the correlation, s is the down-mix, Sd is a decorrelated signal generated by the decoder, wx and w2 are the first and second weights respectively and * denotes complex conjugate. a and / or β can be determined from the estimated weights and parameter data, for example, substantially as:

According to an optional feature of the invention, the up-mixer is arranged to determine the first weight estimate: determining a first energy measure indicative of a combination energy without phase alignment for the first channel signal and the second signal channel in response to the up-mix parameter data; determining a second energy measure indicative of a combination energy with phase alignment of the first channel and the second channel in response to the up-mix parameter data; determining a first measure of the first energy measure in relation to the second energy measure; determining the first weight estimate in response to the first measurement.
This can provide a highly advantageous determination of the first weight estimate. The feature can provide improved performance and / or easier operation.
The first energy measurement can be an indication of the energy of a sum of the first channel signal and the second channel signal. The second energy measure can be an indication of the energy of a coherent sum of the first channel signal and the second channel signal. The first measurement can represent an indication of the degree of phase cancellation between the first channel signal and the second channel signal. The first and / or the second energy measure can be an indication of an energy and can be specifically related to standardized energy measures, for example, relative to an energy of the first and / or the second channel signal.
The first measure can, for example, be determined as a ratio between the first energy measure and the second energy measure. For example, the first measure can be determined substantially as:

The first weight can be determined as a non-linear and / or monotonous function of the first measurement. The second weight can, for example, be determined from the first weight, for example, so that the sum of the amplitude of the two weights has a predetermined value. In some embodiments, the generation of the first and / or the second weight may include a normalization of the down-mix energy. For example, weights can be sized to result in a downmix with substantially the same energy as the sum of the energy of the left channel signal and the energy of the right channel signal.
Weights can specifically be generated substantially as follows:
gi = 2-q / g2 = q I results in W1 = SC ct Wl = g2-C t where c is selected to provide the desired energy normalization.
The encoder can perform the same operations and derivation of the first weight (and possibly the second weight) as described with reference to the encoder.
According to an optional feature of the invention, the up-mixer is arranged to determine the first weight estimate by: for each plurality of predetermined value pairs of the first weight and the second weight, determine, in response to the data of parameter, a measure of energy indicative of a down-mix energy corresponding to the pre-determined value pairs; and determining the first weight in response to energy measurements and predetermined value pairs.
This can provide a highly advantageous determination of the first weight estimate. The feature can provide improved performance and / or easier operation.
The decoder can assume that the down-mix is a combination of a plurality of down-mixes using predetermined fixed weights, with the combination being dependent on the signal energy of each down-mix. Thus, the first weight estimate (and / or the second weight estimate) can be determined to correspond to a combination of predetermined weights, where the combination of individual predetermined weights is determined in response to the estimated energy (or power equivalent) of each of the down-mixes.
The estimated energy for each down-mix can be determined based on the up-mix parameter data.
Specifically, the first weight estimate can be determined by combining the pairs of predetermined values with a weight of each pair of predetermined values being dependent on the energy measure for the pair of predetermined values.
The energy measurement for a pair of predetermined values can specifically be determined substantially as:
where m is an index for the pair of predetermined weights and M (m, k) represents the kth weight of the mth pair of predetermined weights.
In some embodiments, a skew can be introduced for one or more of the weight pairs. For example, the energy measure can be determined as:
where b (m) is a bias function, which can introduce an additional bias for one or more of the down-mixes. The skew function can be a function of the up-mix parameter data.
According to one aspect of the invention, an encoder is provided to generate an encoded representation of a multichannel audio signal comprising at least a first channel and a second channel, the encoder comprising: a down-mixer for generating a down-mix as a combination of at least a first channel signal of the first heavy channel for a first weight and a second channel signal of the second heavy channel for a second weight, the first weight and the second weight having different amplitudes for at least some time intervals- frequency; a circuit for generating up-mix parameter data by configuring a relationship between the first channel signal and the second channel signal, the up-mix parameter data additionally configuring the first weight and the second weight; and a circuit to generate the coded representation to include the down-mix and up-mix parameter data.
This can provide a particularly advantageous encoding, which can be compatible with the decoder described above. It will be appreciated that most of the comments provided with reference to the decoder apply equally to the encoder as appropriate.
The first and second weights may not be included in the up-mix parameter data or, in fact, may not be communicated or distributed by the encoder. The downmix can be coded according to any suitable coding algorithm.
According to an optional feature of the invention, the down-mixer is arranged to: determine a first energy measure indicative of a combination energy without phase alignment for the first channel signal and the second channel signal; determining a second energy measure indicative of an energy of a phase-aligned combination of the first channel signal and the second channel signal; determining a first measure of the first energy measure in relation to the second energy measure; and determining the first weight and the second weight in response to the first measurement. This can provide particularly advantageous coding.
According to an optional feature of the invention, the down-mixer is arranged to: for each of a plurality of predetermined value pairs of the first weight and the second weight, generate a down-mix; for each of the down-mixes, determine a measure of energy indicative of a down-mix energy; and generate the down-mix by combining the down-mixes in response to energy measurements. This can provide particularly advantageous coding.
According to one aspect of the invention, a method of generating a multichannel audio signal is provided, the method comprising: receiving a down-mix being a combination of at least a first heavy channel signal by a first weight and a second signal channel weighed by a second weight, the first weight and the second weight having different amplitudes for at least some time-frequency intervals; receiving up-mix parameter data by configuring a relationship between the first channel signal and the second channel signal; generate a first weight estimate for the first weight and a second weight estimate for the second weight from the up-mix parameter data; and generate the multichannel audio signal by performing an up-mix in the down-mix, in response to the up-mix parameter data, the first weight estimate and the second weight estimate, the up-mixing being dependent on the amplitude of at least one among the first weight estimate and the second weight estimate.
According to one aspect of the invention, a method of generating a coded representation of a multichannel audio signal comprising at least a first channel and a second channel is provided, the method comprising: generating a down-mix as a combination of at least one first channel signal of the first heavy channel by a first weight and a second channel signal of the second heavy channel by a second weight, with the first weight and the second weight having different amplitudes for at least some time-frequency intervals; generate up-mix parameter data by configuring a relationship between the first channel signal and the second channel signal, the up-mix parameter data additionally configuring the first weight and the second weight; and generate the encoded representation to include the down-mix and up-mix parameter data.
According to one aspect of the invention, an audio bit stream is provided for a multichannel audio signal comprising a down-mix being a combination of at least a first heavy channel signal for a first weight and a second heavy channel signal for a second weight, with the first weight and the second weight having different amplitudes for at least some time-frequency intervals; and up-mix parameter data configuring a relationship between the first channel signal and the second channel signal, the up-mix parameter data additionally configuring the first weight and the second weight. The first and second weights may not be included in the bit stream.
These and other aspects, characteristics and advantages of the invention will be apparent and defined with reference to the realization (s) described below. BRIEF DESCRIPTION OF THE DRAWINGS
Realizations of the invention will be described, by way of example only, with reference to the drawings, in which Figure 1 is an illustration of an audio distribution system according to some embodiments of the invention; Figure 2 is an illustration of elements of an audio encoder according to some embodiments of the invention; Figure 3 is an illustration of elements of an audio encoder according to some embodiments of the invention; and Figure 4 is an illustration of elements of an audio decoder according to some elements of the invention. DETAILED DESCRIPTION OF SOME ACCOMPLISHMENTS OF THE INVENTION
The following description focuses on the embodiments of the invention applicable to encoding and decoding a multichannel signal with two channels (i.e., a stereo signal). Specifically, the description focuses on the down-mixing- of a stereo signal to a mono down-mix and associated parameters, and the associated up-mixing. However, it will be appreciated that the invention is not limited to this application, but can be applied to many other multichannel systems (including stereo), such as, for example, MPEG Surround and parametric stereo as in HE-AAC v2,
Figure 1 illustrates a transmission system 100 for communicating an audio signal according to some embodiments of the invention. The transmission system 100 comprises a transmitter 101, which is coupled to a receiver 103 via a network 105, which can be specifically the Internet.
In the specific example, transmitter 101 is a signal recording device and receiver 103 is a signal reproducing device, but it will be appreciated that, in other embodiments, a transmitter and receiver can be used in other applications and for other purposes. For example, transmitter 101 and / or receiver 103 may be part of a transcoding feature and may, for example, provide interfacing to other signal sources or destinations.
In the specific example where a signal registration function is supported, transmitter 101 comprises a digitizer 107 that receives an analog signal that is converted to a multichannel PCM signal ("Pulse Code Modulated") by sampling and conversion analog to digital.
The digitizer 107 is coupled to the encoder 109 of Figure 1, which encodes the multichannel PCM signal according to an encoding algorithm. The encoder 109 is coupled to a network transmitter 111, which receives the encoded signal and interfaces with the Internet 105. The network transmitter can transmit the encoded signal to the receiver 103 via the Internet 105.
The receiver 103 comprises a network receiver 113, which interfaces with the Internet 105 and which is arranged to receive the encoded signal from the transmitter 101.
The network receiver 113 is coupled to a decoder 115. The decoder 115 receives the encoded signal and decodes it according to a decoding algorithm.
In the specific example in which a signal reproduction function is supported, the receiver 103 additionally comprises a signal reproducer 117, which receives the decoded audio signal from decoder 115 and presents it to the user. Specifically, the signal player 117 may comprise an analog to digital converter, amplifiers and speakers as required to produce the decoded multichannel audio signal.
Figure 2 illustrates encoder 109 in greater detail. The received left and right signals are first converted to the frequency domain. In the specific example, the signal on the right is fed to a first frequency subband converter 201, which converts the right signal to a plurality of frequency subbands. Similarly, the signal on the left is fed to a second frequency sub-band converter 203, which converts the left signal into a plurality of frequency sub-bands.
The left and right subband signals are fed to a 2.05 down-mix processor, which is arranged to generate a down-mix of the stereo signals, as will be described in greater detail later. In the specific example, the down-mix is a mono signal, which is generated by combining the individual sub-bands of the right and left signals to generate a mono sub-band down-mix signal in the frequency domain. Thus, the down-mix is performed based on sub-bands. The down-mix processor 205 is coupled to the down-mix encoder 207, which receives the mono down-mix signal and encodes it according to a suitable coding algorithm. The mono down-mix signal transferred to down-mix encoder 207 can be a sub-band signal in the frequency domain or it can first be transformed back into the time domain.
The encoder 109 additionally comprises a parameter processor 209, which generates parametric spatial data that can be used by the decoder 115 to perform up-mix on the down-mix signal to a multichannel signal.
Specifically, the parameter processor 209 can group the frequency subbands into Bark or ERB subbands, for which indications are extracted. The parameter processor 209 can specifically use a standard approach to generate the parameter data. In particular, algorithms known from the techniques of Parametric Stereo and MPEG Surround can be used. Thus, the parameter processor 209 can generate the Level Difference Between Channels (ILD), the Coherence / Correlation Between Channels (IC / ICC), the Time Lag Between Channels (IPD) or the Time Difference Between Channels (ITD) for each parameter sub-band, as will be known to the technician in the subject.
Parameter processor 209 and encoder 207 are coupled to an output processor 211, which multiplexes the encoded down-mix data and the parameter data to generate a compact encoded data signal, which specifically can be a bit stream.
Figure 3 illustrates the principle of generating the down-mix signal from encoder 109 and illustrates the references that will be used in the following description. As illustrated, the right (1) and left (r) input signals are separately sent to the first and second frequency subband converters 201, 203. The outputs are K frequency subband signals llf. . . , lk and rlf. . . , rk, respectively, which are fed to the down-mix processor 205. The down-mix processor 205 generates the down-mix signal (dlt dk) from the left and right subband signals (llf lk and r2 /. .., rk) which are fed to the down-mix encoder 207 to generate the down-mix signal in the time domain d, which can then be encoded (in some embodiments, the down-mix subband is directly encoded.
In conventional systems, the down-mix is performed by a linear sum of the left and right signals in each sub-band. Typically, a passive down-mix is performed by simply adding or averaging the left and right signals. However, such an approach leads to substantial problems when the right and left signals are close to being out of phase with each other, since the resulting sum signal will be substantially reduced, and may even be reduced to zero for completely out-of-phase signals. In some conventional systems, the added signals can be scaled to result in a down-mix signal with an energy corresponding to the input signals. However, this can still be problematic, since the relative error and uncertainty of the generated down-mix sample becomes more significant at lower values. Power normalization will scale not only the downmix, but also this associated error signal. In fact, for completely outdated signals, the resulting sum or average signal is zero and, therefore, cannot be scaled.
In some systems, a weighted sum is used, where the weights are not simple unitary or scalar values, but, in addition, they introduce a phase shift to the left and right signals. This approach is used to provide phase alignment, so that the sum of the left and right signals is performed in phase, that is, it is used to align the phase of the signals to a coherent sum. However, generating such a phase-aligned down-mix has a number of disadvantages. In particular, it tends to be a complex and ambiguous operation, which can result in reduced audio quality.
However, in contrast to these approaches, the down-mix of the system in Figures 1 to 3 is generated using weights, which can not only have different phases, but can also have different amplitudes. Thus, the amplitude of the weights for the two channels may, at least for some characteristics of the signal, have different values. Thus, in the generated down-mix, the weight of the two stereo channels is different.
In addition, the subband weights applied to the combination of the left and right subband signals in a down-mix subband are also dependent on the signal, and vary as a function of the signal characteristics for the left signals. and right. Specifically, in each subband, weights are determined depending on the signal characteristics in the subband. Thus, both phase and amplitude are signal dependent and may vary. Thus, the amplitude of the weights will vary over time.
Specifically, weights can be modified such that a skew for different amplitudes for weights is introduced for left and right signals that are increasingly out of step with each other. For example, the difference in amplitude between the weights may be dependent on a measure of crossed power for the left and right signals. The cross power measurement can be a cross correlation of the left and right signals. The cross-power measure can be a standardized measure for energy in at least one of the left and right channels.
Thus, weights, and specifically both phase and amplitude, are, in the specific example, dependent on energy measurements for the left signal and the right signal, as well as on a correlation between them (such as, for example, represented by a cross-power measurement).
Weights are determined from characteristics of the left and right signals, and can be specifically determined without regard to the parameter data generated by the 209 parameter processor. However, as will be shown later, the parameter data generated is also dependent on energy of signal, and this may allow the decoder to recreate the weights used in the down-mix from the parameter data. Thus, despite the fact that varying weights with different amplitudes are used, these weights do not need to be explicitly communicated to the decoder, but can be estimated based on the received parameter data. Thus, in contrast to expectations, no additional data overhead needs to be communicated to support weights with different amplitudes.
In addition, the use of different weights can be used to avoid or mitigate lag problems associated with conventional fixed sum, without having to perform phase alignment and, thus, introduce the disadvantages associated with it.
For example, a measure indicative of the power of a combination without phase alignment of the left and right signals in relation to the combined power of the left and right signals can be generated. Specifically, the power / energy of the sum signal for the left and right signals can be determined and related to the sum of the power / energy of the left signal and the power / energy of the right signal. A higher value of this measure will indicate that the left and right signals are not out of phase and that, therefore, symmetrical weights (of uniform energy) can be used for the down-mix. However, for increasingly lagged signals, the first power (of the sum signal) is reduced to zero and, thus, a lower value of the measure will indicate that the left and right signals are increasingly lagged, and that a simple sum therefore, it will not be advantageous as a downmix signal. Therefore, weights can be increasingly asymmetric, resulting in more contribution from one channel than another in the down-mix, thus reducing the cancellation of one signal by another. Indeed, for outdated signals, the down-mix can, for example, be determined simply as one of the left and right signals, that is, the energy of a weight can be zero.
As a more specific example, a measure, r, reflecting the ratio between the sum energy between the left and right signals and the left and right signals with phase alignment (that is, the energy following coherent in the phase addition of the left signals) and law) can be determined:
where ipd is the lag between the left and right signals (which is also one of the parameters determined by the parameter processor 209), <•> denotes the internal product and E {.} is the expectation operator.
The relative value above is thus generated to reflect a relative relationship between an energy measure for the sum of the left and right signals and an energy measure indicative of the combination energy with phase alignment of the left and right signals. The weights are then determined from this relative value.
The ratio r is indicative of how lagged the two signals are. In particular, for completely out-of-phase signals, the ratio is equal to 0, and, for completely phased signals, the ratio is equal to 1. Thus, the ratio provides a normalized measure ([0.1]) of how much energy reduction occurs due to the lags between the left and right channels. It can be demonstrated that:
where Ei and Er are the left and right signal energies and Eir is the cross correlation between the left and right signals. Then using:

where iid is the difference between channels and icc is the coherence between channels, this can be demonstrated to lead to:

Thus, as illustrated, measure r, which is indicative of how many signals are out of phase, can be derived from the parameter data and thus can be determined by decoder 115 without requiring any data to be communicated.
The proportion can be used to generate the weights for the down-mix signals. Specifically, the downmix signal can, in each subband, be generated as:

Weights can be generated from the proportion r, such that the asymmetry (energy difference) increases as r approaches zero. For example, an intermediate value can be generated as: 1/4 q = r
Using the intermediate value q, two gains are calculated as: g2 = 4. The weights can then be determined by an optional energy normalization: w, = g, cr W2 = g2 'C where c is chosen to provide normalization desired. Specifically, c can be selected such that the energy of the resulting down-mix is equal to the power of the left signal plus the power of the right signal.
As another example, the intermediate value can be generated as:
which will tend to provide weights that are constant (completely symmetrical or completely asymmetric) for an increasing variety of signal conditions.
Thus, encoder 109 can, in such an embodiment, employ a flexible and dynamic down-mix, where the weights are automatically adapted to the specific signal conditions, such that the disadvantages associated with the fixed downmix or with phase alignment can be avoided or mitigated. In effect, the approach can gradually and automatically adapt from a completely symmetrical down-mix, treating both channels equally, to a completely asymmetric down-mix, where a channel is completely ignored. This adaptation can allow the down-mix to provide an improved signal on which the up-mix is based, while at the same time generating a down-mix signal that can be used directly (that is, it can be used as a signal mono). In addition, the example described provides a very smooth and gradual transition of the energy difference, thus providing an improved listening experience.
In addition, as will be demonstrated later, this improved performance can be achieved without requiring that any additional data be distributed to provide information on the selected weights. Specifically, as shown above, weights can be determined from transmitted parameter data and, as will be demonstrated later, conventional approaches to up-mixing based on assumptions of equal down-mixing weights can be modified and extended to allow up-mixing for weights with different energies (or equivalently different amplitudes or powers).
In the following, another example of a coding approach using different down-mix weights will be described. In some situations, the down-mix can be created without using parameter data. In other situations or realizations, parameter data can also be used in the decoder to determine weights. The approach is based on determining a plurality of intermediate down-mixes using predetermined weights (which, specifically, can have energy symmetry, that is, can have the same energy and only, for example, introduce a phase shift) . The intermediate down-mixes are then combined into a single down-mix, where each of the intermediate down-mixes is then combined into a single down-mix, where each of the down-mixes is weighed depending on the energy of the down- intermediate mix. Thus, intermediate downmixes that have low energy because they originated from the combination of substantially outdated signals have less weight than intermediate down-mixes that have very high energy because they originate from more coherent combinations. The resulting down-mix can then have the energy normalized in relation to the input signals.
In more detail, a set of sub-band down-mixes a priori (intermediaries) different pJt, P ~ § generated as:

Typically, the number of intermediate down-mixes can be kept low, thus resulting in low complexity and reduced computational requirements. In particular, the number of intermediate subband down-mixes is ten or less and a particularly advantageous trade-off between complexity and performance has been found for four intermediate down-mixes.
In the specific example, four (P = 4) intermediate down-mixes a priori (predetermined and fixed) are used with the specific weights:
weights can also be expressed in matrix form:

These down-mixes a priori correspond to ideal down-mixes for cases in which the left and right signals are equal in amplitude and lagged at 0, 90, 180 or 270 degrees. Alternatively, a set of two down-mixes a priori can be used, for example, p = 1 and p = 4,
Then, the Epk (n) energies for each of these options are determined by
with w being an optional window centered around the sample index n. Subband down-mixes are combined to form a new dk subband down-mix by
where the weights ap, k are determined from the relative strength of the down-mixes. Thus, the different intermediate downmixes are combined into a single down-mix, weighting each of them according to their relative strength.
Relative strength can be based on energy, such as, for example,
where ε is a small positive constant to avoid division by zero. Other measures, such as wrapping measures, can certainly be used.
The final dk down-mix is generated from dk by energy normalization. Specifically, the energy of dk can be determined, and the scaling required to adjust this to be equal to the sum of the energies of the left and right signals can be performed.
As a specific example, for each down-mix, the proportion of energy and skewed sum can be calculated as:
where b (m) is a skew function that can introduce an additional skew to the standard down-mix, according to:
and the final weights are determined by an energy normalization: f W2 = g2-c / where c is selected such that the energy of the resulting down-mix is equal to the power of the left channel plus the power of the right channel.
It should be noted that these approaches allow weights to be generated by decoded 115 using the received parameter data and do not require additional data to be transmitted.
The approach described avoids or mitigates both the disadvantages of the passive and active (fixed) down-mix associated with outdated signals, without having to use phase alignment and the associated disadvantages.
An advantage of the approach described is that the linear combination of a plurality of different intermediate down-mixes provides additional robustness, since lag problems tend to be restricted to only one or possibly two down-mixes. In addition, by using only four intermediate down-mixes, an efficient and low demand for computing resources can be achieved.
It is also worth mentioning that, fundamentally, the dk down-mix signal is only a linear combination of the left and right signals, that is,
where each βkl, i = l, 2 depends on E k and the chosen w.
It is also worth mentioning that E k depends on the energies of the left and right and the cross energy. In particular, it can be demonstrated that:
where it denotes the real part of a complex number.
This allows for a simpler computational scheme, since the energies of intermediate down-mixes do not need to be measured and, in fact, intermediate down-mixes do not need to be explicitly generated. Instead, apk values can be derived from a priori downmix weights w and energy E k where the latter follows directly from the measured energies and cross energy of the original signals as indicated above.
Consequently, βki follows from the chosen wpl and the measured energies and the crossed energy, since

In addition, energy compensation easily follows from incoming energies and
The approach described may be less efficient for situations where the correlation between the left and right signals is low, or when the energies of the left and right signals are substantially different. However, in these cases, a good down-mix is provided by simply adding the left and right signals.
The consideration can be used to modify the approach as follows. First, the μ modulation index is defined as
where E , E2 and E} 2 are the left signal, right and cross energy, respectively. Note that 0 <μ <1. the calculation of a can now be adapted to prefer the down-mix p = 1 (assuming that it corresponds to an average signal, as in our example) if μ is as low as, for example
This leads to the creation of a down-mix that has numerical robustness, and also includes outdated components in the down-mix as well.
Once again, it should be noted that the generation of down-mix using intermediate fixed down-mixes is based on the down-mix parameters that are, in fact, dose dependent. However, the dependence on the resulting down-mix weights is only dependent on the energies, E2 and the cross energy En. Since this is also the case for the parameter data (for example, the generated ILD, IPD and IC), it is possible for the decoder 115 to derive the applied weights from the transmitted parameter data. Specifically, the weights can be found by the decoder, evaluating the same functions described above with reference to the encoder 109.
In more detail, the weight for a given down-mix signal can be found from the parameters by first considering μ as:
Then, using the following relationship, it can be calculated for all p:
From this, βkl follows as: p

In the example above, several 20 encoder approaches have been described, which apply a signal-dependent dynamic variation of the down-mix weights (including amplitude variations) to provide an improved and more robust down-mix signal. The approaches specifically use asymmetric weights (with potentially different ranges) to improve performance. In addition, as demonstrated, the down-mix weights can be derived from the weights and thus can be determined by the decoder, thus allowing a decoder operation to perform up-mixing based on an assumption of an approach of encoder that uses different energies for the weights. This up-mixing is based only on the downmix signal and spatial parameters and does not require additional information. Thus, the operation of the decoder has been modified to take into account weights that have different amplitudes, and thus it is not based on the assumption of down-mix weights with equal amplitude, as conventional decoders. In the following, different examples of such decoders will be described and it will be demonstrated that not only can up-mixing approaches be modified to operate with asymmetric amplitude down-mix weights, but also that this can be achieved based on existing parameter data and without requiring additional data to be communicated.
Figure 4 illustrates an example of a decoder according to some embodiments of the invention.
The decoder comprises a receiver 401, which receives the data stream from encoder 109. The receiver 401 is coupled to a parameter processor 403, which receives the parameter data from the data stream. Thus, the 403 parameter processor receives the parameter data from the data stream. Thus, the 403 parameter processor receives the IID, IPD and ICC values from the data stream.
The receiver 401 is additionally coupled to a 405 down-mix decoder, which decodes the received coded down-mix signal. The down-mix decoder 405 performs the reverse function of down-mix encoder 207 of encoder 109 and thus generates a subband signal in the frequency domain (or a signal in the time domain which is then converted to a subband signal in the frequency domain).
The 405 down-mix decoder is additionally coupled to an up-mix processor 077, which is also coupled to the 403 parameter processor. The up-mix processor 407 up-mixes the down-mix signal to generate a multichannel signal (which in the specific example is a stereo signal). In the specific example, the mono down-mix is up-mixed to the left and right channels of a stereo signal. Up-mixing is performed based on parameter data and determined downlink weight estimates, which can be generated from parameter data. The up-mix stereo channel is fed to an output circuit 409, which, in the specific example, may include a conversion from a frequency subband domain to the time domain. Output circuit 409 can specifically include a reverse QMF or FFT transform.
In the decoder of Figure 4, the parameter processor 403 is coupled to a weight processor 411, which is additionally coupled to the up-mix processor. The 411 weight processor is arranged to estimate down-mix weights from the received parameter data. This determination is not limited to an assumption of equal weights. Instead, while decoder 115 may not necessarily know exactly which down-mix weights have been applied to encoder 109, decoding is based on the use of potentially asymmetric weights with a difference (amplitude) between the weights. Thus, the received parameters are used to determine the energy / amplitude and / or the angle of the weights. In particular, the determination of weights is carried out in response to parameters indicative of energy relationships between channels. Specifically, the determination is not limited to the IPD phase value, but is in response to the IID and / or ICC values.
The determination of the weights applied specifically uses the same approach previously described for decoder 115. Thus, the same calculations previously described for encoder 109 can be performed by the weight processor 411 to result in the weights wl and w2 that will (or are supposed to will) have been used by the corresponding encoder 109.
The up-mixing performed by conventional decoders is based on an assumption that the applied weights are identical for the two channels or only differ by a phase value. However, in decoder 115 of Figure 4, the up-mixing also takes into account the difference in amplitude between the weights, and is specifically modified such that the weights actually estimated wx and w2 from the parameter processor 403 are used to modify the up-mixing. Thus, conventional up-mix approaches have been modified to additionally consider signal dependent weights that vary dynamically, whose estimates are calculated from the received parameter data.
Below, specific examples of up-mix algorithms that have been extended to accommodate weights with different energies will be presented.
Up-mix methods that use a General Phase Difference (OPD) indicative of the absolute (average) phase shift of the left and right subband channels relative to a reference (typically the left channel) are known.
Specifically, the Parametric Stereo standard uses the following up-mix:
where s is the received mono down-mix signal and sd is a decorrelated signal generated by the decoder, as will be known to the person skilled in the art. Ci and c2 are gained to ensure correct level differences between the left and right signals.
Specifically, Ci, c2, a and β can be determined as:

This equation is still valid for the situation where the weights wi and w2 have different energies if the OPD value is properly modified. Thus, it is not necessary to modify the above equation to decode the signals, allowing energy differences between the weights. This is because the up-mix matrix always reinstalls the correct spatial indications (IID, ICC, IPD), regardless of the OPD. The OPD can be seen as an additional degree of freedom. OPD is defined as the angle between the left channel and the sum signal, ss generated by the sum between the left and right signals:

where Pji is the power of the left signal, and Plr is the cross power of the cross correlation of the left and right signals. Like this:
where Prr is the signal strength on the right.
Thus, the weights wl and w2 can first be determined by the weight processor 411 based on the parameter data as previously described, and the estimated weights can then be used together with the parameter data to generate a general phase value that takes into account the potentially asymmetric weight (that is, the difference between the weights, including the amplitude asymmetry). The general phase value generated can then be used to generate the up-mix signal from the down-mix signal and a correlated signal.
In some embodiments, the OPD value can be generated under the assumption that the channels are correlated, that is, that the icc parameter has a unit value. This leads to the following OPD value:

Thus, the decoder can generate an up-mix signal that does not suffer as much from the typical disadvantages associated with a fixed sum or phase alignment down-mix approach. In addition, this is achieved without requiring additional data to be sent.
As another example, the up-mixing can be based on the forecast of the uncorrelated signal from the down-mix signal. The down-mix signal is generated as where both wl and w2 can be complex. Then, an auxiliary signal can be reconstructed using a complex sized rotation resulting in a general down-mix matrix of:

Thus, the signal represents a difference signal for the left and right signals.
The resulting theoretical up-mix matrix can be determined as:

The difference signal can be expressed by a predictable component, which can be predicted from the down-mix signal if an unpredictable component, which is de-correlated with the down-mix signal s. Thus, de can be expressed as:
where sd is a decorrelated sum signal generated by the decoder, cr is a complex prediction factor, and β is a decorrelation scaling factor (real value). This leads to:

Thus, given that the prediction factor cr and the scaling factor β can be determined, the up-mix can be generated by this approach.
In the previous equation to generate the difference signal, the second term of β-sd represents the part of the difference signal that cannot be predicted from the down-mix signal s. To maintain a low data rate, this residual signal component is typically not communicated to the decoder and, therefore, the up-mix is based on the locally generated de-correlated signal and the de-correlation scaling factor.
However, in some cases, the residual signal β-sd is encoded as a dre3 signal and communicated to the decoder. In such cases, the difference sign can be given as:
which leads to:

In addition, both the prediction factor a and the de-correlation design factor β can be determined from the received parameter data:
Thus, the forecast-based approach allows an up-mixing to be generated, which is based on an assumption of asymmetric energy weights being used for the down-mix. In addition, the up-mix process is controlled by the parameter data and no additional information needs to be transmitted by the encoder.
In more detail, the complex forecast factor a and the correlation scaling factor β can be derived from the following considerations.
First, the forecast parameter a is given as:
Then, using the parameter definition:
The scaling factor β is given as:
using the assumption that the power of the de-correlated signal corresponds to the power of the sum signal.

The previous examples described a system which allows variant and asymmetric weights (including asymmetry of amplitude between weights) to be used with a down-mix / up-mix system without requiring additional parameters to be communicated. Instead, the weights and the up-mix operation can be based on the parameter data.
Such an approach is particularly advantageous when the sub-bands used for the down-mix and the up-mix correspond relatively closely to the analysis ranges for which the parameters are calculated.
This can often be the case for lower frequencies, where the down-mix sub-bands and the parametric analysis frequency bands tend to coincide. However, in some realizations, it may be advantageous, for example, to have sub-bands of down-mix that have a finer quantization of frequency and / or time than the frequency bands of analysis, since this can, in some situations, result in improved audio quality. This can be particularly the case for higher frequencies,
Thus, at higher frequency ranges, the correlation between sub-bands of the down-mix and parameter analysis may be different. Since weights can be different for individual down-mix sub-bands, the correlation between parameter data and individual weights for each sub-band may be less accurate. However, parameter data can typically be used to generate a more rough estimate of down-mix weights, and typically, associated quality degradation will be acceptable.
Specifically, in some embodiments, the encoder can evaluate the difference between the present downmix weights used in each subband and those that can be calculated based on the parameter data of the widest range of analysis. If the discrepancy becomes too large, the encoder may include an indication of this. Thus, the encoder can include an indication of whether or not the parameter data should be used to generate the weights for at least one frequency-time interval (for example, for a down-mix subband of a segment). If the indication is that parameter data should not be used, the encoder may instead use another approach, such as, for example, basing the up-mix on the assumption that the down-mix is a simple simulation .
In some embodiments, the encoder may additionally be arranged to include an indication of the down-mix weights used for sub-bands for which the precision indication indicates that the parameter data is insufficient to estimate the weights. In such embodiments, the decoder 115 can thus directly extract the weights. In such embodiments, the decoder 115 can thus directly extract these weights and apply them to the appropriate sub-bands. Weights can be reported as absolute values or they can, for example, be reported as relative values, such as, for example, the difference between the weights present and those that are calculated using the parameter data.
It will be appreciated that the above description, for clarity, described embodiments of the invention with reference to different functional circuits, units and processors. However, it will be apparent that any suitable distribution of functionality between different functional circuits, units or processors can be used without departing from the invention. For example, features illustrated as being performed by separate processors or controllers can be performed by the same processor or controller. Thus, references to specific functional units or circuits should be seen only as references to appropriate means to provide the described functionality rather than indicative of a precise logical or physical structure or organization.
The invention can be implemented in any suitable form, including hardware, software, firmware or any combination thereof. The invention can optionally be implemented at least partially as computer software running on one or more data processors and / or digital signal processors. The elements and components of an embodiment of the invention can be physically, functionally and logically implemented in any suitable manner. In effect, the functionality can be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention can be implemented in a single unit or it can be physically and functionally distributed among different units, circuits and processors.
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Instead, the scope of the present invention is limited only by the appended claims. Additionally, despite the fact that a characteristic may appear to have been described in connection with particular embodiments, a person skilled in the art would recognize that various characteristics of the described embodiments can be combined according to the invention. In the claims, the term understand does not exclude the presence of other elements or stages.
In addition, although individually listed, a plurality of means, elements, circuits or method steps can be implemented by, for example, a single circuit, unit or processor. In addition, although individual features may have been included in features it is not possible and / or advantageous. In addition, the inclusion of a feature in a claim category does not imply a limitation to that category, but rather indicates that the feature is equally applicable to 5 other claim categories as appropriate. Furthermore, the order of the characteristics in the claims does not imply any specific order in which the characteristics are to be worked on, and in particular, the order of the steps in particular in a method 10 claim does not imply that the steps must be performed in that order.
Instead, the steps can be performed in any appropriate order. In addition, singular references do not exclude a plurality. Thus, references to "one", "one", "first", "second", etc., do not exclude a plurality.
Reference signs in the claims are provided merely as an illustrative example, they should not be construed as limiting the scope of the claims in any way.
权利要求:
Claims (14)
[0001]
1. DECODER TO GENERATE A MULTI-CHANNEL AUDIO SIGNAL, the decoder (115) characterized by comprising: a first receiver (401, 405) to receive a down-mix being a combination of at least a first heavy channel signal for a first weight and a second channel signal weighed by a second weight, the first weight and the second weight having different amplitudes for at least some time-frequency intervals; a second receiver (401, 403) for receiving up-mix parameter data, configuring a relationship between the first channel signal and the second channel signal; a circuit (411) for generating a first weight estimate for the first weight and a second weight estimate for the second weight from the up-mix parameter data; and an up-mixer (407) to generate the multichannel audio signal by up-mixing the down-mix signal in response to the up-mix parameter data, the first weight estimate and the second weight estimate, the up - mixing being dependent on an amplitude of at least one of the first weight estimate and the second weight estimate.
[0002]
2. DECODER, according to claim 1, characterized in that the circuit (411) is arranged to generate the first weight estimate and the second weight estimate with different ratios to at least some parameters of the parameter data, for at least some intervals time-frequency.
[0003]
3. DECODER according to claim 2, characterized in that the up-mixer (407) is arranged to determine at least one of the first weight estimate and the second weight estimate as a function of an energy parameter of the parameter data up-mix, the energy parameter being indicative of a relative energy characteristic for the first channel signal and the second channel signal.
[0004]
4. DECODER, according to claim 3, characterized by the energy parameter being at least one among: a parameter of Intensity Difference Between Channels, IID; a Level Difference Between Channels parameter, ILD; and a parameter of Coherence / Correlation Between IC / ICC Channels.
[0005]
5. DECODER, according to claim 1, characterized by the up-mix parameter data comprising a precision indication for a relationship between the first weight and the second weight and the up-mix parameter data, and the decoder ( 115) be willing to generate at least one of the first weight estimate and the second weight estimate in response to the accuracy indication.
[0006]
6. DECODER, according to claim 1, characterized in that at least one of the first weight and the second weight, by at least one frequency range, has a finer frequency-temporal resolution than a corresponding parameter of the parameter data up-mix.
[0007]
7. DECODER, according to claim 1, characterized by the up-mixer (407) being arranged to generate a General Lag value in response to the parameter data and to perform the up-mixing in response to the General Lag value, the value General Lags being dependent on the first weight estimate and the second weight estimate.
[0008]
8. DECODER, according to claim 1, characterized by the up-mixing being independent of the amplitude of at least one of the first weight estimate and the second weight estimate, except for the General Lag value.
[0009]
9. DECODER, according to claim 1, characterized in that the up-mixer (407) is arranged to: generate a decorrelated signal from the down-mix signal, the decorrelated signal being decorrelated with the down-mix signal; perform an up-mix on the down-mix signal by applying matrix multiplication to the down-mix signal and the decorrelated signal, where coefficients of matrix multiplication are dependent on the first weight estimate and the second weight estimate.
[0010]
10. DECODER according to claim 1, characterized in that the up-mixer (407) is arranged to determine the first weight estimate: determining a first energy measure indicative of a combination energy without phase alignment for the first signal channel and the second channel signal in response to the up-mix parameter data; determining a second energy measure indicative of a combination energy with phase alignment of the first channel and the second channel in response to the up-mix parameter data determining a first measure of the first energy measure in relation to the second energy measure ; determining the first weight estimate in response to the first measurement.
[0011]
11. DECODER, according to claim 1, characterized by the up-mixer (407) being arranged to determine the first weight estimate: determining, for each of them, a plurality of pairs of predetermined values of the first weight and the second weight, in response to parameter data, a measure of energy indicative of a down-mix energy corresponding to the predetermined value pairs; and determining the first weight in response to energy measurements and predetermined value pairs.
[0012]
12. DECODER TO GENERATE A CODED REPRESENTATION OF A MULTI-CHANNEL AUDIO SIGNAL, characterized by comprising at least a first channel and a second channel, the encoder comprising: a down-mixer (201, 203, 205) to generate a downmix as a combination of at least a first channel signal from the first heavy channel by a first weight and a second channel signal from the second heavy channel by a second weight, the first weight and the second weight having different amplitudes for at least some time-frequency intervals ; a circuit (201, 203, 209) for generating up-mix parameter data by configuring a relationship between the first channel signal and the second channel signal, the up-mix parameter data additionally configuring the first weight and the second Weight; and a circuit (207, 211) to generate the coded representation to include the down-mix signal and the up-mix parameter data, in which the down-mixer (201, 203, 205) is arranged to: determine a first energy measure indicative of a combination energy without phase alignment for the first channel signal and the second channel signal; determining a second energy measure indicative of an energy of a phase-aligned combination of the first channel signal and the second channel signal; determining a first measure of the first energy measure in relation to the second energy measure; and determining the first weight and the second weight in response to the first measurement.
[0013]
13. METHOD OF GENERATING A MULTI-CHANNEL AUDIO SIGNAL, the method characterized by understanding: receiving a down-mix signal being a combination of at least a first heavy channel signal for a first weight and a second heavy channel signal for a second weight, the first weight and the second weight having different amplitudes for at least some time-frequency intervals; receiving up-mix parameter data by configuring a relationship between the first channel signal and the second channel signal; generate a first weight estimate for the first weight and a second weight estimate for the second weight from the up-mix parameter data; and generate the multichannel audio signal by up-mixing the down-mix signal in response to the up-mix parameter data, the first weight estimate and the second weight estimate, the up-mixing being dependent on an amplitude of at least one of the first weight estimate and the second weight estimate.
[0014]
14. METHOD OF GENERATING A CODED REPRESENTATION OF A MULTI-CHANNEL AUDIO SIGNAL characterized by comprising at least a first channel and a second channel, the method comprising: generating a down-mix signal as a combination of at least a first channel signal from the first channel weighed by a first weight and a second channel signal from the second channel weighed by a second weight, the first weight and the second weight having different amplitudes for at least some time-frequency intervals; generate up-mix parameter data by configuring a relationship between the first channel signal and the second channel signal, the up-mix parameter data additionally configuring the first weight and the second weight; and generate the encoded representation to include the down-mix signal and the up-mix parameter data.
类似技术:
公开号 | 公开日 | 专利标题
BR112012011084B1|2020-12-08|decoder to generate a multichannel audio signal, encoder to generate a coded representation of a multichannel audio signal, method of generating a multichannel audio signal and method of generating a coded representation of a multichannel audio signal
US20190058960A1|2019-02-21|Parametric stereo upmix apparatus, a parametric stereo decoder, a parametric stereo downmix apparatus, a parametric stereo encoder
JP2020060788A|2020-04-16|Device and method for estimating time difference between channels
ES2609449T3|2017-04-20|Audio decoding
KR101613975B1|2016-05-02|Method and apparatus for encoding multi-channel audio signal, and method and apparatus for decoding multi-channel audio signal
BR112012026324B1|2021-08-17|AUDIO OR VIDEO ENCODER, AUDIO OR VIDEO ENCODER AND RELATED METHODS FOR MULTICHANNEL AUDIO OR VIDEO SIGNAL PROCESSING USING A VARIABLE FORECAST DIRECTION
BRPI1005299B1|2020-11-24|apparatus and method to perform the upmmix on a downmix audio signal
CA2887228C|2019-09-24|Encoder, decoder and methods for backward compatible multi-resolution spatial-audio-object-coding
BRPI0517949B1|2019-09-03|conversion device for converting a dominant signal, method of converting a dominant signal, and computer readable non-transient means
TW201103008A|2011-01-16|Parametric stereo encoding and decoding
BRPI0609897A2|2011-10-11|encoder, decoder, method for encoding a multichannel signal, encoded multichannel signal, computer program product, transmitter, receiver, transmission system, methods of transmitting and receiving a multichannel signal, recording and reproducing devices. audio and storage medium
JP2017535153A|2017-11-24|Audio encoder and decoder
BR112021010956A2|2021-08-31|DEVICE AND METHOD TO GENERATE A SOUND FIELD DESCRIPTION
同族专利:
公开号 | 公开日
TW201145259A|2011-12-16|
WO2011058484A1|2011-05-19|
MX2012005414A|2012-06-14|
EP2499638A1|2012-09-19|
JP2013511062A|2013-03-28|
RU2012123750A|2013-12-20|
CN102598122A|2012-07-18|
US9070358B2|2015-06-30|
RU2560790C2|2015-08-20|
CN102598122B|2014-10-29|
EP2323130A1|2011-05-18|
EP2499638B1|2015-02-25|
KR20120089335A|2012-08-09|
BR112012011084A2|2017-09-19|
TWI573130B|2017-03-01|
JP5643834B2|2014-12-17|
US20120224702A1|2012-09-06|
KR101732338B1|2017-05-04|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5956674A|1995-12-01|1999-09-21|Digital Theater Systems, Inc.|Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels|
KR101035104B1|2003-03-17|2011-05-19|코닌클리케 필립스 일렉트로닉스 엔.브이.|Processing of multi-channel signals|
US7447317B2|2003-10-02|2008-11-04|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V|Compatible multi-channel coding/decoding by weighting the downmix channel|
US7394903B2|2004-01-20|2008-07-01|Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V.|Apparatus and method for constructing a multi-channel output signal or for generating a downmix signal|
US7392195B2|2004-03-25|2008-06-24|Dts, Inc.|Lossless multi-channel audio codec|
JP5032977B2|2004-04-05|2012-09-26|コーニンクレッカフィリップスエレクトロニクスエヌヴィ|Multi-channel encoder|
DE102004043521A1|2004-09-08|2006-03-23|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Device and method for generating a multi-channel signal or a parameter data set|
JP4892184B2|2004-10-14|2012-03-07|パナソニック株式会社|Acoustic signal encoding apparatus and acoustic signal decoding apparatus|
US7720230B2|2004-10-20|2010-05-18|Agere Systems, Inc.|Individual channel shaping for BCC schemes and the like|
US7961890B2|2005-04-15|2011-06-14|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung, E.V.|Multi-channel hierarchical audio coding with compact side information|
JP2006325162A|2005-05-20|2006-11-30|Matsushita Electric Ind Co Ltd|Device for performing multi-channel space voice coding using binaural queue|
ES2433316T3|2005-07-19|2013-12-10|Koninklijke Philips N.V.|Multi-channel audio signal generation|
MX2008001307A|2005-07-29|2008-03-19|Lg Electronics Inc|Method for signaling of splitting information.|
ES2587999T3|2005-10-20|2016-10-28|Lg Electronics Inc.|Procedure, apparatus and computer-readable recording support to decode a multichannel audio signal|
KR101218776B1|2006-01-11|2013-01-18|삼성전자주식회사|Method of generating multi-channel signal from down-mixed signal and computer-readable medium|
KR101358700B1|2006-02-21|2014-02-07|코닌클리케 필립스 엔.브이.|Audio encoding and decoding|
WO2007111568A2|2006-03-28|2007-10-04|Telefonaktiebolaget L M Ericsson |Method and arrangement for a decoder for multi-channel surround sound|
PL2067138T3|2006-09-18|2011-07-29|Koninl Philips Electronics Nv|Encoding and decoding of audio objects|
MX2009003564A|2006-10-16|2009-05-28|Fraunhofer Ges Forschung|Apparatus and method for multi -channel parameter transformation.|US8571875B2|2006-10-18|2013-10-29|Samsung Electronics Co., Ltd.|Method, medium, and apparatus encoding and/or decoding multichannel audio signals|
EP2464145A1|2010-12-10|2012-06-13|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for decomposing an input signal using a downmixer|
CN103403801B|2011-08-29|2015-11-25|华为技术有限公司|Parametric multi-channel encoder|
ES2555136T3|2012-02-17|2015-12-29|Huawei Technologies Co., Ltd.|Parametric encoder to encode a multichannel audio signal|
JP2015517121A|2012-04-05|2015-06-18|ホアウェイ・テクノロジーズ・カンパニー・リミテッド|Inter-channel difference estimation method and spatial audio encoding device|
KR20140016780A|2012-07-31|2014-02-10|인텔렉추얼디스커버리 주식회사|A method for processing an audio signal and an apparatus for processing an audio signal|
ES2638391T3|2012-08-10|2017-10-20|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoder, decoder, system and procedure that employs a residual concept for parametric coding of an audio object|
EP2717261A1|2012-10-05|2014-04-09|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoder, decoder and methods for backward compatible multi-resolution spatial-audio-object-coding|
CA3031476C|2012-12-04|2021-03-09|Samsung Electronics Co., Ltd.|Audio providing apparatus and audio providing method|
US8804971B1|2013-04-30|2014-08-12|Dolby International Ab|Hybrid encoding of higher frequency and downmixed low frequency content of multichannel audio|
CN104299615B|2013-07-16|2017-11-17|华为技术有限公司|Level difference processing method and processing device between a kind of sound channel|
CN105336335B|2014-07-25|2020-12-08|杜比实验室特许公司|Audio object extraction with sub-band object probability estimation|
EP2980789A1|2014-07-30|2016-02-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for enhancing an audio signal, sound enhancing system|
ES2809677T3|2015-09-25|2021-03-05|Voiceage Corp|Method and system for encoding a stereo sound signal using encoding parameters from a primary channel to encode a secondary channel|
EP3301673A1|2016-09-30|2018-04-04|Nxp B.V.|Audio communication method and apparatus|
US10224042B2|2016-10-31|2019-03-05|Qualcomm Incorporated|Encoding of multiple audio signals|
CA3127805A1|2016-11-08|2018-05-17|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus and method for encoding or decoding a multichannel signal using a side gain and a residual gain|
CN109389985B|2017-08-10|2021-09-14|华为技术有限公司|Time domain stereo coding and decoding method and related products|
US10580420B2|2017-10-05|2020-03-03|Qualcomm Incorporated|Encoding or decoding of audio signals|
EP3550561A1|2018-04-06|2019-10-09|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Downmixer, audio encoder, method and computer program applying a phase value to a magnitude value|
US10904690B1|2019-12-15|2021-01-26|Nuvoton Technology Corporation|Energy and phase correlated audio channels mixer|
法律状态:
2017-11-07| B25D| Requested change of name of applicant approved|Owner name: KONINKLIJKE PHILIPS N.V (NL) |
2017-11-21| B25G| Requested change of headquarter approved|Owner name: KONINKLIJKE PHILIPS N.V (NL) |
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according art. 34 industrial property law|
2019-09-03| B06U| Preliminary requirement: requests with searches performed by other patent offices: suspension of the patent application procedure|
2020-05-26| B15K| Others concerning applications: alteration of classification|Free format text: A CLASSIFICACAO ANTERIOR ERA: G10L 19/00 Ipc: G10L 19/008 (2013.01), H04S 3/00 (2006.01), H04S 3 |
2020-05-26| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application according art. 36 industrial patent law|
2020-08-18| B09A| Decision: intention to grant|
2020-12-08| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 10 (DEZ) ANOS CONTADOS A PARTIR DE 08/12/2020, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
EP09175771A|EP2323130A1|2009-11-12|2009-11-12|Parametric encoding and decoding|
EP09175771.6|2009-11-12|
PCT/IB2010/055025|WO2011058484A1|2009-11-12|2010-11-05|Parametric encoding and decoding|
[返回顶部]